成对图像和文本的大型数据集越来越受到愿景和愿景和语言任务的通用表示。此类数据集已通过查询搜索引擎或收集HTML Alt-Text构建 - 由于Web数据是嘈杂的,因此它们需要复杂的过滤管道来维护质量。我们探索备用数据源以收集具有最小滤波的高质量数据。我们介绍Redcaps - 从Reddit收集的12M图像文本对的大规模数据集。来自Reddit的图像和标题描绘并描述了各种各样的物体和场景。我们从手动策划的FuSoddits集中收集数据,这为粗略图像标签提供给粗略图像标签,并允许我们转向数据集组合而不标记单个实例。我们展示Redcaps培训的标题模型产生了人类优选的丰富和各种标题,并学习转移到许多下游任务的视觉表现。
translated by 谷歌翻译
Quantum causality is an emerging field of study which has the potential to greatly advance our understanding of quantum systems. In this paper, we put forth a theoretical framework for merging quantum information science and causal inference by exploiting entropic principles. For this purpose, we leverage the tradeoff between the entropy of hidden cause and the conditional mutual information of observed variables to develop a scalable algorithmic approach for inferring causality in the presence of latent confounders (common causes) in quantum systems. As an application, we consider a system of three entangled qubits and transmit the second and third qubits over separate noisy quantum channels. In this model, we validate that the first qubit is a latent confounder and the common cause of the second and third qubits. In contrast, when two entangled qubits are prepared and one of them is sent over a noisy channel, there is no common confounder. We also demonstrate that the proposed approach outperforms the results of classical causal inference for the Tubingen database when the variables are classical by exploiting quantum dependence between variables through density matrices rather than joint probability distributions. Thus, the proposed approach unifies classical and quantum causal inference in a principled way.
translated by 谷歌翻译